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Abstract — Test engineering for fault tolerant VLSI systems is 
encumbered with optimization requisites for hardware 
overhead, test power and test time. The high level quality of 
these complex high-speed VLSI circuits can be assured only 
through delay testing, which involves checking for accurate 
temporal behavior. In the present paper, a data-path based 
built-in test pattern generator (TPG) that generates iterative 
pseudo-exhaustive two-patterns (IPET) for parallel delay 
testing of modules with different input cone capacities is 
implemented. Further, in the present study a CMOS 
implementation of low power architecture (LPA) for scan based 
built-in self test (BIST) for delay testing and combinational 
testing is carried out. This reduces test power dissipation in 
the circuit under test (CUT). Experimental results and 
comparisons with pre-existing methods prove the reduction 
in hardware overhead and test-time. 

Index Terms— delay testing, parallel BIST, digital VLSI, FPGA, 
fault tolerance, low power. 

I. Introduction 

Current system on chip (SoC) are high performance multi- 
core heterogeneous systems, which include reconfigurable 
resources such as field programmable gate arrays (FPGA) to 
increase design flexibility. These high speed SOC designs 
fabricated in the deep-submicron/ lesser sub-micron domains 
are prone to process variations and large spatial variations, 
caused by intrinsic aspects such as line-edge roughness, 
random dopant and body thickness fluctuations. These 
variations lead to large spreads in circuit delay and power, 
making them more susceptible to delay faults of the order of 
picoseconds. 

Delay faults affect the timing of the circuit at its operating 
speed. These faults can limit the continued scaling of silicon 
fabrication technologies. Design for testability (DFT) and 
fault tolerant design for delay testing needs to be addressed 
at the early stages of the VLSI design cycle. Path delay and 
transition delay are the most popular delay fault models, both 
of which requires two-pattern test for delay fault detection. 
The test requires a pair of vectors {Tl, T2} to be applied at 
speed, where Tl initializes the target node/path and T2 
launches the appropriate transition and propagates it to an 
observable point. 

Iterative structures suited for VLSI implementations like 
multipliers, digital signal processing systems, bit-sliced 
microprocessors and embedded memories require IPET for 
delay testing. Iterative pseudo-exhaustive one-pattern 
generators are proposed in [1] and [2]. No scheme for 
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accumulator based IPET test generation exists except in [3]. 
In this paper, less complex circuits already existing in the 
SOC are utilized for efficient BIST applications. The 
implementation of a built-in test pattern generator (TPG) for 
generating iterative pseudo-exhaustive two-patterns (IPET) 
is discussed. This TPG enables testing of iterative logic arrays 
as well as VLSI circuits partitioned for pseudo-exhaustive 
testing. The design aims at the reduction of test time, hardware 
overhead incurred due to DFT and optimal fault coverage. 
Very few dynamic test power reduction methods for scan 
based delay testing are discussed in the present available 
literature [4]. In [5], a method for minimizing peak power 
consumption during scan testing using x-filling heuristics is 
explained. In [6], a method for reducing the test power is scan 
BIST using dynamic clock control circuit is presented. Hence, 
in the present paper a CMOS implementation of a low power 
architecture (LPA) for scan based delay testing as well as 
combinational fault testing of VLSI systems is implemented. 
Further, the FPGA implementations of these architectures is 
carried out. 

The structure of the paper is as follows: In section II, the 
preliminaries for pseudo-exhaustive testing and delay testing 
is given. Section III, IV, and V outlines the architectures and 
the analytical basis. Section VI presents the discussion of 
experimental results. Finally, Section VII summarizes the 
methodology and scope for future work. 

II. Background 

Many previous works prove that, external testing 
techniques using automatic test equipment (ATE) and 
pseudo-random testing, practiced by the industry, are 
inefficient for at-speed delay testing. An alternate to 
pseudorandom testing is exhaustive testing, which requires 
22n test patterns for delay testing, where 'n' is the number of 
inputs of the VLSI circuit. Apparently, true exhaustive testing 
is impractical for large VLSI circuits. Partitioning to allow 
exhaustive test of sub-circuits offers an attractive alternative, 
which leads to pseudo-exhaustive testing approach [1]. 

Pseudo-exhaustive testing provides 100% fault-coverage 
for detectable faults. For an 'n' input CUT with 'm' outputs 
and cone-size 'k', pseudo-exhaustive testing involves in 
applying exhaustive test to the 'm' output cones. In many 
cases, each output of (n, m, k) CUT depends only on a subset 
of primary inputs. A segment with a single output is called a 
cone. Here, the dependency set 'k' of an output represents 
the number of inputs on which an output depends. This CUT, 
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Figure 1. Architecture of the TPG 



partitioned into 'm' cones with cone-size 'k', can be tested in 
parallel, thus reducing the total test time and test vector 
length. Parallel testing can be achieved by inserting suitable 
DFT logic at the partitioned points [7]. 

In CMOS VLSI circuits, dynamic power significantly 
contributes to total power dissipation. Dynamic power results 
from the activity of a circuit in changing its states due to the 
charging and discharging of the effective capacitive loads. 
Dynamic Power 'P' dissipated at a node is given by: 



P = l/2cVaf 



(1) 



where 'c' is the capacitance of the node, V is supply volt- 
age, 'f is the clock frequency and '+' is the node activity 
factor. Circuits are often designed to operate in normal and 
test modes. Dissipation of power is more in a test mode than 
a normal mode, especially if a scan mechanism is employed. 
This is due to random combinational logic activity in the 
CUT due to scan in and scan out of test data that subse- 
quently increases the power consumption and test time. Yield 
loss can result if this consumption is higher than that of the 
normal functional operation for which the circuit is designed. 

III. The TPG Architecture 

In IPET, the entire 'n' bit space will be exhaustively cov- 
ered, if for all 'n-k+1' adjacent 'k' bit subspaces, all possible 
two-patterns occur at least once.The TPG shown in Fig. 1 is 

©2013 ACEEE 
DOI01.DCSI.4.2.1181 



46 



a module with n-inputs E[n: 1], which are used to select cone- 
size „k of the IPET generated at the outputs of the accumu- 
lator. The TPG consists of two parts: data-path module and 
control module. The control module controls the entire IPET 
generation process. The selective counter is designed so 
that, its increment value in each clock cycle for every signal 
E[k] is as shown in Table I. The n-stage accumulator accumu- 
lates the output of selective counter. Depending on the val- 
ues of E[k], the n-stage selective counter and accumulator 
are reconfigured to work as k-sub-stage counters and k-sub- 
stage accumulators respectively. The carry input of each sub- 
accumulator is driven by the carry output of the preceding 
sub-accumulator. The E[k] signals and the carry outputs of 
the n-stage generic accumulator form the input of carry gen- 
erator module. In this module, when E[k] is enabled, the cor- 
responding Cout[k] signal of the n-stage accumulator is 
propagated as carry input (Cin) to the accumulator. As de- 
picted in Fig. 2, the control module detects specific states of 
the selective counter and accumulator, to generate the re- 
quired control signals for the entire architecture. 

After each (n,k) pseudo-exhaustive two-pattern test is 
complete, 'endtest' signal is generated and 'k' is incremented 
until kd"n to iterate the two-patterns. For this purpose, the 
'endtest' signal clocks a [log2n] stage counter which drives 
the inputs of a [log2n] :n decoder. General pseudo-exhaustive 
two-patterns are obtained by excluding these two blocks. 
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Table I. Functionality Of TPG Generator 
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EfSl 


(XXXXX) (XXXXXXX) 
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IV. Control Module Operation 

The control module operation which forms the crux of the 
design is given below. Its operation is based on two-pattern 
test algorithm proposed in [8]. The algorithm generates all n- 
bit two-pattern test transitions within 2n x (2n -1) + 
lcycles.The algorithm consists of three steps: 
Generation of n-bit s-circle(2n-3) starting from 
(2n-l) 

• Generation of n-bit circle(2n-2) starting from 
(2n-l) 

• Generation of zero-transitions 
where, 

• S-circle(i) - generates vectors starting from A and performing 
consecutive sequence(i) until returning to A after the final 
step(i) of sequence(i). 

• Step(i) - generates transitions from number A to number B 
such that B = (A+i) mod N 

• Circle(i) - generates vectors originating from A and 
performing consecutive step(i) until returning to number A. 

• Sequence(k) - generates vectors starting from A and 
performing consecutive step(i) such that, i = 1, 2, 3.... k. 

Assuming n=4, k=3, N=2n and K=2k, the control module 
operation is as per the following phases (refer Fig. 2): 

• Phase 1: In this phase, selective counter starts counting in 
each clock cycle and accumulator accumulates the count 
value. This continues until the selective counter reaches K- 
3=5 .When selective counter value is 5(fifth clock cycle), it is 
reset and count is resumed in the sixth clock cycle. For this 
purpose, the control logic should generate 'resetc' control 
signal. Accumulator continues the accumulation process and 
enters phase 2. 

• Phase 2: In this phase, when accumulator value is N-l=15 
and selective counter value is K-3=5 (35th clock cycle), the 
selective counter value is incremented by one clock cycle 
and is disabled. The algorithm enters phase 3. The control 
module has to generate 'hold' signal to disable the counter. 

• Phase 3: In this phase, when accumulator again holds the 
value N- 1= 15 (42nd clock cycle), clock of the selective counter 
is divided by 2. The accumulator continues to accumulate 
and is reset to zero every second clock cycle. During this 
phase, the control module has to generate accumulator reset 
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signal 'reseta' and a 'dividebytwo' signal for the selective 
counter. This process continues until selective counter and 
the accumulator reaches N-l=7 (56th clock cycle). For this 
condition, the control module has to generate endtest signal, 
to indicate end of (n,k)-pseudo-exhaustive two-pattern test. 
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Figure 2. Flowchart of control module operation 

V. Low Power Architecture for Delay Testing 

In the present work, the low power architecture (LPA) 
reduces the number of switching transitions from propagating 
into the CUT, subsequently reducing the dynamic power. 
This is achieved by modifying the scan chain using a separate 
C-chain of flip-flops and a logic array of XOR gates and AND 
gates as shown in Fig. 3 [9]. Test data is transferred from the 
C-chain to the D-register cells. Cells in the C-chain change 
only in the shift mode. In this mode, the enable signal (clke) 
is 0. Therefore, transitions in the C-chain are masked by the 
enable signal (clke) of the AND gate. The transitions can 

—ACEEE 



Full Paper 



ACEEE Int. J. on Control System and Instrumentation, Vol. 4, No. 2, June 2013 



Cin 



Din 



M 
HI 

a 
a. 
m 

o 
o 

I 

UL 
W 
K 

| 



C REGISTER 
CHAIN 



T 

Cout 



C1 



C2 



C3 



C4 



AND -XOR 
ARRAY 



ENABLE/ 
SELECT 



D REGISTER 
CHAIN 



CONTROL 
UNIT 



, ,Dout 




Figure 3. The low power architecture 



only affect the XOR gate. This causes a small number of 
transitions at the data register outputs and hence low test 
power dissipation in the CUT. As most flip-flops provide 
both Q and its complement Q0, XOR functionality can be 
provided by using two transistors in a pass transistor-based 
structure (refer Fig. 1 1). The effects of this implementation 
on power consumption are negligible due to pass-transistor 
based structure. As shown in Fig. 11, the logic required for 
providing the enable signal (clke) signal requires only six 
transistors. 

Testing a delay fault requires the application of different 
vectors at two consecutive clocks. Unlike the conventional 
scan cells, the LPA architecture allows the at-speed 
application of arbitrary vector pairs to the CUT in two 
consecutive clocks, resulting in testing of delay faults. This 
is accomplished as follows: Assume that the test generated 
for a delay fault includes the vector pair (V 1 , V2) . First, the DR 
chain is loaded with VI . Then, the difference vector TD that 
is required for changing the D vector VI to V2 is shifted into 
the C- chain. At the next clock edge, the D vector VI is 
changed to V2. Furthermore, C-chain hardware is designed 
to take advantage of similar adjacent test data. Reconstruction 
of the original test vectors from the difference information is 
possible using the LPA. Consequently, this attribute makes 
the architecture suitable for algorithms that are based on 
compressing test vectors. LPA can reduce test time and power 
in conjunction with test reformatting techniques. Test data 
compression can additionally decrease test time and memory 
requirements. 

VI. Simulation Environment and Results 

The discussed architectures are synthesized using Ca- 
dence encounter tool and technology mapping is done based 



on 0. 18 micrometer technology library. Comparisons in test 
length for exhaustive two-pattern generator and pseudo-ex- 
haustive two-pattern generator for various values of 'n' and 
'k' for optimal fault coverage are given in Table II. 

Reduction in test- vector length, results in reduced test- 
time and test-power. Test-length depends on cone-size, re- 
gardless of the VLSI network size. From the generated pat- 
terns depicted in Table. Ill, it can be inferred that, identical 
test patterns are generated with different cone size depend- 
ing on the value of E[k]. This leads to variations in the degree 
of parallelism for testing. Hence the TPG can be used to test 
systems with varying cone-sizes, thus easily varying the 
degree of parallelism for testing. 

Table II. Test-Length Comparisions 





Exhaustive 


Pseudo Exhaustive 


% reduction in 


n 


Two- Pattern 


Two 


- Pattern 


length of test 


Test Length 


n,k 


Test 
Length 


vectors 


4 


256 


4,2 


13 


94.92 


5 


1024 


5,3 


57 


94.43 


6 


4096 


6,2 


13 


99.68 


7 


16384 


7,3 


57 


99.65 


8 


65536 


3,2 


13 


99.98 
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The control module is designed by representing its 
functionality using a finite state machine (FSM). Design entry 
in done by a behavioral description of its functionality using 
verilog hardware description language (VHDL). Synthesis 
and optimization for different values of „n is carried out 
using Cadence Encounter tool. Carry generator module is 
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Figure 7. Signature obtained for fault free s27 benchmark circuit 

designed with 'n' AND gates and an OR gate. Design of 'm to and '2n' OR gates [3]. The RTL schematics of the synthesized 
n' decoder and 'm' stage counter is done with '2n' gates with modules and simulation results are shown in Fig. 3-6. 
'm' D flip-flops. Selective counter design is obtained by Very few IPET generators are proposed for hardware 

modifying D-registers in the data-path with 'n' 2: 1 multiplexers overhead comparisons. Hence, one option is to multiplex the 

outputs of two pseudo-exhaustive one-pattern generators 
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Figure. 8 RTL schematic of selective counter 



a«jiLHri)0CO_ii 




proposed in [1] and [2]. Hardware overhead comparisons are 
made assuming that XOR gate consists of 4-gate equivalents, 
D-flip-flop consists of 8-gate equivalents and multiplexer 
consists of 3-gate equivalents. 

From Table IV, it can be concluded that compared with 
pre-existing works, there is a reduction in hardware overhead 
for the proposed TPG design. Datapath based TPG is realized 
on target device Spartan 6E field programmable gate array 
(FPGA).The synthesis results shows very less device utili- 
zation. Hence, this TPG can aid the BIST of FPGA systems. 
The reprogramability of FPGAs can be utilized to create the 
BIST logic. The FPGA can be configured as parallel 1-D itera- 
tive logic arrays (ILA). By repeated reconfiguration, compre 
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Figure 9. FPGA implementation of control module 

hensive fault coverage can be obtained. The BIST logic 
ceases to exist when the circuit is reconfigured for its normal 
system function. Thus, testability can be achieved without 
any area overhead or performance degradation. 

The proposed LPA does not increase the delay during 
normal operation during delay fault testing. The generated 
test vectors are applied to s27 benchmark circuit and the 
output of CUT is applied to a signature analyzer. From the 
generated signature, any stuck- at-faults in the CUT can be 
identified. From the simulation results shown in Fig. 6 and 
Fig. 7, the difference in signature for fault-free and fault in- 
jected circuit is observed. Fig. 12 and Fig. 13 show the TSPICE 
simulation waveforms obtained from the CMOS implementa- 
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Figure. 10 Schematic obtained in Cadence for the control module 
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Table III. A Part Of IPET Generated By TPG 



OUTPLT(ii=e\ 


Count 


OUTPUT 


Count 


OUTPUT 


Cowit 


kF2) 












(11) (11) (11) 


1 


(010)(0IQ) 


11 


(010)(010) 


35 


(01) (01) (01) 


2 


(oiiXoii) 


12 


(111) (111) 


36 


(10) (10) (10) 


3 


(loixioi) 


13 


(iio)(iio) 


37 


(11) (11) (11) 


4 


(ooixooi) 


14 


(101)(101) 


38 


(10) (10) (10) 


5 


(101)(101) 


15 


(100)(100) 


39 


(01) (01) (01) 


6 


(ooixooi) 


16 


(ooixooi) 


40 


(11) (11) (11) 


7 


(looxioo) 


17 


(010)(010) 


41 


(00) (00) (00) 


8 


(110)(110) 


18 


(ooixooi) 


42 


(01) (01) (01) 


9 


(oio)(oio) 


19 


(iiDdii) 


43 


(00) (00) (00) 


10 


(110)(110) 


20 


(000)(000) 


44 


(10) (10) (10) 


11 


(looxioo) 


21 


(oo i)(ooi) 


45 


(00) (00) (00) 


12 


(ioi)(ioi) 


22 


(000X000) 


46 


(11) (11) (11) 


13 


(iii)(iii) 


23 


(oioxoio) 


47 


n=6,k=3 




(oii)(oii) 


24 


(000X000) 


48 


(111) (111) 


1 


(in) (in) 


25 


(000(001) 


49 


(ooi)(ooi) 


2 


(loixioi) 


26 


(000)(000) 


50 















Table IV. Hardware Overhead Comparisons 
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tion of LPA with three scan cells. This simulation is for 0.25 
micrometer CMOS technology library and a voltage supply 
of 3 V. Dout (Vql) is the output of the data register. TD (Cinl) 
is the serial input of the C-chain. Cl(Coutl) and C2(Cout2) 
are the outputs of the two C-chain triggering registers. In 
this simulation, after shifting test data (Cinl) to the C-chain, 
the new test vector is triggered in the D register chain when 
the clke (clkenable) is active. The disadvantage of the LPA is 
the increased area-overhead due to the AND-XOR logic. 
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VII. Conclusion 

In VLSI circuits, utilization of the same TPG for BIST of 
modules with different cone sizes drives down the DFT hard 
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ware overhead and DFT cost. The datapath based TPG pre- 
sented in the current paper provides optimal fault coverage, 
at-speed parallel testing, reduction in hardware overhead 
and reduction in test time. In the future, this TPG will be 
used to aid the design of fault tolerant self-checking VLSI 
systems. The implementation of LPA for delay testing proves 
its improved performance in terms of test time and test power. 
In the future, the LPA performance will be improvised by 
applying efficient compression algorithms to the test data 
vectors. 
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